Graph memory nodes are graph nodes representing either memory allocation or free actions. As a
shorthand, nodes that allocate memory are called allocation nodes. Likewise, nodes that free memory
are called free nodes. Allocations created by allocation nodes are called graph allocations. CUDA as-
signs virtual addresses for the graph allocation at node creation time. While these virtual addresses
are fixed for the lifetime of the allocation node, the allocation contents are not persistent past the
freeing operation and may be overwritten by accesses referring to a different allocation.
Graph allocations are considered recreated every time a graph runs. A graph allocation’s lifetime, which
differs from the node’s lifetime, begins when GPU execution reaches the allocating graph node and
ends when one of the following occurs:
▶ GPU execution reaches the freeing graph node
▶ GPU execution reaches the freeing cudaFreeAsync() stream call
▶ immediately upon the freeing call to cudaFree()
Note: Graph destruction does not automatically free any live graph-allocated memory, even though it
ends the lifetime of the allocation node. The allocation must subsequently be freed in another graph,
or using cudaFreeAsync()∕cudaFree().
Just like other Graph Structure, graph memory nodes are ordered within a graph by dependency edges.
A program must guarantee that operations accessing graph memory:
▶ are ordered after the allocation node
▶ are ordered before the operation freeing the memory
Graph allocation lifetimes begin and usually end according to GPU execution (as opposed to API invo-
cation). GPU ordering is the order that work runs on the GPU as opposed to the order that the work
is enqueued or described. Thus, graph allocations are considered ‘GPU ordered.
